COVID-19 Data Analysis

Modello Predittivo a 7gg

Max Pierini, Sandra Mazzoli, Alessio Pamovio

START: 2020-09-02 10:11:06.119719

Abstract

A simple method is presented to predict new cases and Percent Positive, estimated on effective reproduction number $R_t$ of COVID-19 in italian regions with a Markov chain Monte Carlo and Poisson likelihood parametrized on daily new cases.

Method

New cases $y_r$ for each $r$ italian regions (source Dipartimento Protezione Civile)will be filtered with Hampel filter for outliers detection (gaussian window of 7 days, $\sigma=2$) and smoothed with rolling mean (gaussian window of 14 days, $\sigma=22$). Smoothed new cases will be adjusted to be $>0$ to avoid negative values (due to data error corrections).

For each day $t=[0,T]$ (where $T$ is the last observed day), smoothed new cases $y_{r,t}$ will be supposed distributed as Poisson with $\lambda_{r,t}$ parameter

$$ y_{r,t} \sim \mathcal{P}(\lambda_{r,t}) $$

where $\lambda_{r,t}$ is defined by the serial interval inverse $\gamma$, previous day smoothed new cases $k_{r,t-1}$ and effective reproduction number in time $R_{r,t}$ (ref: Bettencourt & Ribeiro 2008)

$$ \lambda_{r,t} = k_{r,t-1} e^{\gamma (R_{r,t} - 1)} $$

The serial interval SI is supposed to be distributed as Gamma, with mean $\mu=7.5$ and standard deviation $\sigma=3.4$ (ref: Li, Ghuan et Al. 2020a)

$$ \mathbf{SI} \sim \Gamma(\mu_{=7.5}, \sigma_{=3.4}) $$

so that $\gamma$ is distributed as Inverse Gamma

$$ \gamma \sim \Gamma^{-1}(\mu_{=7.5}, \sigma_{=3.4}) $$

Parameters $R_{{r,t}}$ will be distributed Half Normal with mean equal to previous day posteriors $R_{{r,t-1}}$ and overarching unknown precision $\tau$

$$ R_{{r,t}} \sim \mathcal{{N}}^+(R_{{r,t-1}}, \tau) $$

where, first day $R_{{r,0}}$ (outcome) is set to zero

$$ R_{{r,0}} = 0 $$

and $\tau$ priors are distributed Gamma (partial pooling between regions) with shape and rate based on results of preliminary tests with uniformative $\tau$ distribution

$$ \tau \sim \Gamma(1000, 100) $$

so that the standard deviation $\sigma$ will be

$$ \sigma = \frac{1}{\sqrt{\tau}} $$

If previous new cases are zero $k_{r,t-1}=0$, parameter $R_{r,t}$ is undefined, given the chosen function for $\lambda_{r,t}$ parameter of Poisson likelihood, even if it should be $R_{r,t}=0$ (no new cases means null effective reproduction number). Thus, in these cases, priors of $R_{r,t}$ will be forced to

$$ R_{r,t} \sim \mathcal{N}^+(0, \tau) \;,\; k_{r,t-1}=0 $$

and previous new cases will be forced to $k_{r,t-1}=1$, so that $\lambda_{r,t}$ will be

$$ \lambda_{r,t} = e^{\gamma( \mathcal{N}^+(0, \tau) - 1 )} \;,\; k_{r,t-1}=0 $$

At the same time, $R_t$ for Italy as nation will be calculated.

Percent positive (PP), aka Percent Positive Rate or Positivity Rate, can be considered as an index of disease transmission (ref: Johns Hopkins).

The index is calculated as

$$ \mathbf{PP} = \frac{ \Delta_\textrm{positive} }{ \Delta_\textrm{tested} } \cdot 100 $$

where $\Delta_\textrm{positive}$ is daily new amount of positive tests and $\Delta_\textrm{tested}$ is daily new amount of total tests.

When it's too high means that more tests are needed and/or pharmaceutical (PI) or non-pharmaceutical (NPI) interventions should be seriously considered to reduce transmission.

The "alarm threshold" has been established at 5%. World Health Organization recommend to relax COVID-19 NPI (lockdown and high level social distancing rules) if PP keeps below the threshold for at least two weeks.

We can add an "alert" threshold, between 3% and 5%: if PP raises over 3%, governments should consider a likely forthcoming alarm scenario. Below 3% can be considered in a safe area.

Percent positive index for Italy is here calculated using Dipartimento di Protezione Civile published data (ref: GitHub) filterd with Hampel filter for outliers (gaussian window of 14 days, $\sigma=5$) and smoothed with rolling mean (gaussian window of 7 days, $\sigma=2$).

To predict $y_{\mathrm{pred},{T+p}}$ new cases in Italy and regions for the next $p=[1 ... P]$ days, the same likelihood will be used, distributing new cases as Poisson with $\lambda_{\mathrm{pred},{T+p}}$ parameter

$$ y_{\mathrm{pred},{T+p}} \sim \mathcal{P}(\lambda_{\mathrm{pred},{T+p}}) \;,\; p=[1 ... P] $$

where $\lambda_{\mathrm{pred},{T+p}}$ is parametrized on $\gamma$, the posteriors of estimated $y_{\mathrm{pred},{T+p-1}}$ and the $R_{T+p}^*$, where $R_{T}^*$ is the mean of estimated effective reproduction number in the last observed $d$ days

$$ R_t^* = \frac{ \sum_{T-d}^{T}{R_i} }{ d } $$$$ \lambda_{\mathrm{pred},t} = y_{\mathrm{pred},{T+p-1}} \cdot e^{\gamma (R_{T+p}^* - 1)} $$

where $y_{\mathrm{pred},{T+p-1}}$ are distributed Half Normal with uknown precision

$$ y_{\mathrm{pred},{T+p-1}} \sim \mathcal{N}^+ \left( \mu=y_{\mathrm{pred},{T+p-1}} , \tau_{\mathrm{pred}} \right) $$

and

$$ \tau_{\mathrm{pred}} \sim \Gamma( 1, 1) $$

To predict best and worst scenario, $R_p^*$ will be supposed to be respectively higher and lower by $\varepsilon$

$$ R^*_{T+p,\mathrm{best}} = R_{T+p}^* - \frac{p}{P} \varepsilon \;,\; p=[1...P] $$$$ R^*_{T+p,\mathrm{worst}} = R_{T+p}^* + \frac{p}{P} \varepsilon \;,\; p=[1...P] $$

where $\varepsilon$ is the mean of previously estimated $R_t$ variation within $P$ days, rounded to 2 decimals (see Epsilon).

Predicted $\mathbf{PP}_{T+p}$ will be calculated as

$$ \mathbf{PP}_{T+p} = \frac{ y_{\mathrm{pred},T+p} }{ \Delta_{\mathrm{tested}}^* } \cdot 100 $$

where $y_{\mathrm{pred},{T+p}}$ are estimated posteriors of predicted new cases and $\Delta_{\mathrm{tested}}^*$ is the last value of filtered and smoothed tested daily cases.

A Markov chain Monte Carlo will be used with Metropolis-Hasting algorithm and Gibbs sampler (adapt 500, warmup 2000, samples 2000, chains 4, thin 1) in Python 3.8.5 with pyjags==1.3.7 and JAGS 4.3.0 for macOS.

Epsilon

Given previously estimated $R_t$ mean with this model and $\Delta_d (R_t)$ the absolute variation of $R_t$ within $d$ days, $\varepsilon$ can be supposed as the mean of $\Delta_d (R_t)$, rounded to 2 decimals.

$$ \varepsilon = \left\lceil \overline{\Delta_d (R_t)} \right\rfloor $$

The mean of $\Delta_7 (R_t)$ is equal to $0.33245 \pm 0.23636$.

$$ \varepsilon = \left \lceil 0.33245 \right \rfloor = 0.33 $$

Model

model {
    ###################################
    # Rt estimation
    ###################################
    # Overarching Rt standard deviation
    tau_R ~ dgamma( 1000 , 100 )
    sigma_R <- 1 / sqrt( tau_R )

    # Serial interval distribution
    SI_mu <- 7.5
    SI_sd <- 3.4
    SI_sh <- SI_mu^2 / SI_sd^2
    SI_ra <- SI_mu / SI_sd^2
    SI ~ dgamma( SI_sh , SI_ra )
    gamma <- 1 / SI

    for ( r in 1:C ) {
        # First Rt prior
        RR[r,1] <- 0
        for ( t in 2:T ) {
            # Rt prior for k>0
            RRpr[r,t] ~ dnorm( RR[r,(t-1)] , tau_R )  T(0,)
            # Rt prior for k=0
            RRnu[r,t] ~ dnorm( 0 , tau_R )  T(0,)

            # Define Rt prior
            RR[r,t] <- ifelse( kR[r,t-1]==0 , RRnu[r,t] , RRpr[r,t] )
            # Avoid k=0 (undefined Rt)
            KR[r,t] <- ifelse( kR[r,t-1]==0, 1 , kR[r,t-1] )

            # Poisson likelihood
            lambdaR[r,t] <- KR[r,t] * exp( gamma * ( RR[r,t] - 1 ) )
            yR[r,t] ~ dpois( lambdaR[r,t] )
        }
    }

    # First Rt prior
    R[1] <- 0
    for ( t in 2:T ) {
        # Rt prior for k>0
        Rpr[t] ~ dnorm( R[t-1] , tau_R )  T(0,)
        # Rt prior for k=0
        Rnu[t] ~ dnorm( 0 , tau_R )  T(0,)

        # Define Rt prior
        R[t] <- ifelse( k[t-1]==0 , Rnu[t] , Rpr[t] )
        # Avoid k=0 (undefined Rt)
        K[t] <- ifelse( k[t-1]==0, 1 , k[t-1] )

        # Poisson likelihood
        lambda[t] <- K[t] * exp( gamma * ( R[t] - 1 ) )
        y[t] ~ dpois( lambda[t] )
    }

    ###################################
    # Predictions
    ###################################
    R_pred <- sum(R[(T-P):T]) / P
    y_pred[1] <- y[T]
    y_pred_hi[1] <- y[T]
    y_pred_lo[1] <- y[T]
    PP[1] <- PPobs
    PP_lo[1] <- PPobs
    PP_hi[1] <- PPobs

    # New cases precision prior
    y_tau ~ dgamma( 1 , 1 )

    for ( r in 1:C ) {
        RR_pred[r] <- sum(RR[r,((T-P):T)]) / P
        yR_pred[r,1] <- yR[r,T]
        yR_pred_hi[r,1] <- yR[r,T]
        yR_pred_lo[r,1] <- yR[r,T]
        PPR[r,1] <- PPRobs[r]
        PPR_lo[r,1] <- PPRobs[r]
        PPR_hi[r,1] <- PPRobs[r]

        for ( p in 2:P ) {
            RR_pred_hi[r,p] <- RR_pred[r] + 0.33 * (p-1) / (P-1)
            RR_pred_lo[r,p] <- RR_pred[r] - 0.33 * (p-1) / (P-1)

            # most likely scenario
            yR_prior[r,p] ~ dnorm( yR_pred[r,p-1] , y_tau )  T(0,)
            lambdaR_pred[r,p] <- yR_prior[r,p] * exp( gamma * ( RR_pred[r] - 1 ) )
            yR_pred[r,p] ~ dpois( lambdaR_pred[r,p] )
            PPR[r,p] <- yR_pred[r,p] / testsR[r] * 100

            # worst scenario
            yR_prior_hi[r,p] ~ dnorm( yR_pred_hi[r,p-1] , y_tau )  T(0,)
            lambdaR_pred_hi[r,p] <- yR_prior_hi[r,p] * exp( gamma * ( RR_pred_hi[r,p] - 1 ) )
            yR_pred_hi[r,p] ~ dpois( lambdaR_pred_hi[r,p] )
            PPR_hi[r,p] <- yR_pred_hi[r,p] / testsR[r] * 100

            # best scenario
            yR_prior_lo[r,p] ~ dnorm( yR_pred_lo[r,p-1] , y_tau )  T(0,)
            lambdaR_pred_lo[r,p] <- yR_prior_lo[r,p] * exp( gamma * ( RR_pred_lo[r,p] - 1 ) )
            yR_pred_lo[r,p] ~ dpois( lambdaR_pred_lo[r,p] )
            PPR_lo[r,p] <- yR_pred_lo[r,p] / testsR[r] * 100
        }
    }

    for ( p in 2:P ) {
        R_pred_hi[p] <- R_pred + 0.33 * (p-1) / (P-1)
        R_pred_lo[p] <- R_pred - 0.33 * (p-1) / (P-1)

        # most likely scenario
        y_prior[p] ~ dnorm( y_pred[p-1] , y_tau )  T(0,)
        lambda_pred[p] <- y_prior[p] * exp( gamma * ( R_pred - 1 ) )
        y_pred[p] ~ dpois( lambda_pred[p] )
        PP[p] <- y_pred[p] / tests * 100

        # worst scenario
        y_prior_hi[p] ~ dnorm( y_pred_hi[p-1] , y_tau )  T(0,)
        lambda_pred_hi[p] <- y_prior_hi[p] * exp( gamma * ( R_pred_hi[p] - 1 ) )
        y_pred_hi[p] ~ dpois( lambda_pred_hi[p] )
        PP_hi[p] <- y_pred_hi[p] / tests * 100

        # best scenario
        y_prior_lo[p] ~ dnorm( y_pred_lo[p-1] , y_tau )  T(0,)
        lambda_pred_lo[p] <- y_prior_lo[p] * exp( gamma * ( R_pred_lo[p] - 1 ) )
        y_pred_lo[p] ~ dpois( lambda_pred_lo[p] )
        PP_lo[p] <- y_pred_lo[p] / tests * 100
    }
}

Results

Italy

Abruzzo

Basilicata

Calabria

Campania

Emilia-Romagna

Friuli Venezia Giulia

Lazio

Liguria

Lombardia

Marche

Molise

P.A. Bolzano

P.A. Trento

Piemonte

Puglia

Sardegna

Sicilia

Toscana

Umbria

Valle d'Aosta

Veneto

Latest Rt

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-24-2eeac138f288> in <module>
      3 for r, region in enumerate(countries):
      4     steps = .4/(len(percs)-1)
----> 5     for i, (upper, lower) in enumerate(zip(uppers[r,:,-1], lowers[r,:,-1])):
      6         alpha = 1-(i*steps)-(.6*np.ceil(i/len(percs)))
      7         ax.fill_between([r*2,r*2+1], 1, 12, color="red", zorder=0)

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

MCMC diagnosis

sigma

Rt

Prediction tests

model {
    ###################################
    # Rt estimation
    ###################################
    # Overarching Rt standard deviation
    tau_R ~ dgamma( 1000 , 100 )
    sigma_R <- 1 / sqrt( tau_R )

    # Serial interval distribution
    SI_mu <- 7.5
    SI_sd <- 3.4
    SI_sh <- SI_mu^2 / SI_sd^2
    SI_ra <- SI_mu / SI_sd^2
    SI ~ dgamma( SI_sh , SI_ra )
    gamma <- 1 / SI

    # First Rt prior
    R[1] <- 0
    for ( t in 2:T ) {
        # Rt prior for k>0
        Rpr[t] ~ dnorm( R[t-1] , tau_R )  T(0,)
        # Rt prior for k=0
        Rnu[t] ~ dnorm( 0 , tau_R )  T(0,)

        # Define Rt prior
        R[t] <- ifelse( k[t-1]==0 , Rnu[t] , Rpr[t] )
        # Avoid k=0 (undefined Rt)
        K[t] <- ifelse( k[t-1]==0, 1 , k[t-1] )

        # Poisson likelihood
        lambda[t] <- K[t] * exp( gamma * ( R[t] - 1 ) )
        y[t] ~ dpois( lambda[t] )
    }

    ###################################
    # Predictions
    ###################################
    # New cases precision prior
    y_tau ~ dgamma( 1 , 1 )

    for ( t in U:(T-1)) {
        R_pred[t] <- sum(R[(t-P):t]) / P
        y_pred[t,1] <- y[t]
        y_pred_hi[t,1] <- y[t]
        y_pred_lo[t,1] <- y[t]
        PP[t,1] <- PPobs[t]
        PP_lo[t,1] <- PPobs[t]
        PP_hi[t,1] <- PPobs[t]

        for ( p in 2:P ) {
            R_pred_hi[t,p] <- R_pred[t] + 0.32 * (p-1) / (P-1)
            R_pred_lo[t,p] <- R_pred[t] - 0.32 * (p-1) / (P-1)

            # most likely scenario
            y_prior[t,p] ~ dnorm( y_pred[t,(p-1)] , y_tau )  T(0,)
            lambda_pred[t,p] <- y_prior[t,p] * exp( gamma * ( R_pred[t] - 1 ) )
            y_pred[t,p] ~ dpois( lambda_pred[t,p] )
            PP[t,p] <- y_pred[t,p] / tests[t] * 100

            # worst scenario
            y_prior_hi[t,p] ~ dnorm( y_pred_hi[t,(p-1)] , y_tau )  T(0,)
            lambda_pred_hi[t,p] <- y_prior_hi[t,p] * exp( gamma * ( R_pred_hi[t,p] - 1 ) )
            y_pred_hi[t,p] ~ dpois( lambda_pred_hi[t,p] )
            PP_hi[t,p] <- y_pred_hi[t,p] / tests[t] * 100

            # best scenario
            y_prior_lo[t,p] ~ dnorm( y_pred_lo[t,(p-1)] , y_tau )  T(0,)
            lambda_pred_lo[t,p] <- y_prior_lo[t,p] * exp( gamma * ( R_pred_lo[t,p] - 1 ) )
            y_pred_lo[t,p] ~ dpois( lambda_pred_lo[t,p] )
            PP_lo[t,p] <- y_pred_lo[t,p] / tests[t] * 100
        }
    }
}

New cases

Legenda:

  • dotted black line: observed new cases
  • solid blu line: smoothed new cases
  • blu area: most likely predicted new cases HPDI 90%
  • green area: best scenario predicted new cases HPDI 99%
  • orange area: worst scenario predicted new cases HPDI 99%

Percent Positive

Legenda:

  • dotted black line: observed Percent Positive
  • solid blu line: smoothed Percent Positive
  • blu area: most likely predicted Percent Positive HPDI 90%
  • green area: best scenario predicted Percent Positive HPDI 99%
  • orange area: worst scenario predicted Percent Positive HPDI 99%

END: 2020-08-30 17:17:20.787238, Completed in 0:24:51.319689

© 2020 Max Pierini. Thanks to Sandra Mazzoli & Alessio Pamovio

Exported from Italia/Previsioni.ipynb committed by Max Pierini on Wed Sep 2 11:22:05 2020 revision 1, a8f8e6b